Here, we are evaluating the ‘druggability’ of viper proteins that are well correlated with variables. The plots at the end of this are labeled by LVs, but the enrichment is actually based on the list of viper proteins that have a are correlated with these LVs.

For this analysis we only care about drugs that are approved for use in humans. We use the DrugCentral list of drugs to filter this out.

Import packages

First, import packages to process and plot the data.

library(dplyr)
library(tidyr)
library(purrr)
library(synapser)
synLogin()
## Welcome, Robert Allaway!
## NULL
library(clusterProfiler)
library(enrichplot)
# library(ggupset)
library(ggplot2)
library(cowplot)

drugcentral_structures <- synTableQuery("SELECT InChIKey as inchikey, INN as name FROM syn21446643", 
                                        includeRowIdAndRowVersion = F)$filepath %>% 
  readr::read_csv()
## 
 [####################]100.00%   1/1   Done...    
Downloading  [####################]100.00%   174.2kB/174.2kB (763.1kB/s) Job-104754102201851515354012594.csv Done...
viper_res <- synTableQuery('SELECT * FROM syn21260871 where numSamps = 77', includeRowIdAndRowVersion = F)$filepath %>% 
  readr::read_csv()
## 
Building the CSV... [##------------------]10.75%   134868/1254952       
Building the CSV... [#######-------------]35.16%   441248/1254952       
Create CSV FileHandle [##########----------]50.00%   627485/1254952       
Create CSV FileHandle [####################]100.00%   1254952/1254952   Done...    
Downloading  [#-------------------]7.24%   2.0MB/27.6MB (2.7MB/s) Job-104754116824353411733390217.csv     
Downloading  [###-----------------]14.48%   4.0MB/27.6MB (4.7MB/s) Job-104754116824353411733390217.csv     
Downloading  [####----------------]21.72%   6.0MB/27.6MB (6.2MB/s) Job-104754116824353411733390217.csv     
Downloading  [######--------------]28.96%   8.0MB/27.6MB (7.3MB/s) Job-104754116824353411733390217.csv     
Downloading  [#######-------------]36.20%   10.0MB/27.6MB (8.1MB/s) Job-104754116824353411733390217.csv     
Downloading  [#########-----------]43.44%   12.0MB/27.6MB (8.7MB/s) Job-104754116824353411733390217.csv     
Downloading  [##########----------]50.68%   14.0MB/27.6MB (9.1MB/s) Job-104754116824353411733390217.csv     
Downloading  [############--------]57.92%   16.0MB/27.6MB (9.6MB/s) Job-104754116824353411733390217.csv     
Downloading  [#############-------]65.16%   18.0MB/27.6MB (10.1MB/s) Job-104754116824353411733390217.csv     
Downloading  [##############------]72.40%   20.0MB/27.6MB (10.4MB/s) Job-104754116824353411733390217.csv     
Downloading  [################----]79.64%   22.0MB/27.6MB (10.7MB/s) Job-104754116824353411733390217.csv     
Downloading  [#################---]86.89%   24.0MB/27.6MB (11.0MB/s) Job-104754116824353411733390217.csv     
Downloading  [###################-]94.13%   26.0MB/27.6MB (11.1MB/s) Job-104754116824353411733390217.csv     
Downloading  [####################]100.00%   27.6MB/27.6MB (11.3MB/s) Job-104754116824353411733390217.csv Done...
dtex_structures <- synTableQuery('SELECT inchikey, internal_id FROM syn17090819', includeRowIdAndRowVersion = F)$filepath %>% 
  readr::read_csv()
## 
Create CSV FileHandle [##########----------]50.00%   313552/627104       
Create CSV FileHandle [####################]100.00%   627104/627104   Done...    
Downloading  [###-----------------]17.31%   2.0MB/11.6MB (2.7MB/s) Job-104754126606391268633865936.csv     
Downloading  [#######-------------]34.61%   4.0MB/11.6MB (4.8MB/s) Job-104754126606391268633865936.csv     
Downloading  [##########----------]51.92%   6.0MB/11.6MB (6.2MB/s) Job-104754126606391268633865936.csv     
Downloading  [##############------]69.23%   8.0MB/11.6MB (7.3MB/s) Job-104754126606391268633865936.csv     
Downloading  [#################---]86.53%   10.0MB/11.6MB (8.1MB/s) Job-104754126606391268633865936.csv     
Downloading  [####################]100.00%   11.6MB/11.6MB (8.6MB/s) Job-104754126606391268633865936.csv Done...
dtex_targets <- feather::read_feather(synGet('syn20700199')$path)

dtex_targets <- dtex_targets %>% 
  filter(mean_pchembl > 7) %>% 
  mutate(gene= hugo_gene) %>% 
  left_join(dtex_structures) %>% 
  inner_join(drugcentral_structures)

Significance testing

First, create a list of all druggable genes. We’re using the drug-target explorer data, for any gene for which there is drug-target relationship with a mean_pchembl value >7, which corresponds to 1 uM. This gives us a rough approximation of druggable targets in the human genome but could be a bit under conservative.

Then, we’ll use the viper proteins ranked by correlation with latent variable expression, by latent variable (so each group of genes is based on a single latent variable expression, where each gene in the group is a correlated viper protein)- this is based on the threshold in notebook 16.

Then, we’ll perform a weighted Kolmogorov-Smirnov test using the GSEA function from the clusterProfiler package to assess whether any of the viper genes are enriched in the universe of druggable genes. Here, we’re treating each LV as the ranked list of viper genes (for standard GSEA, this would be the differentially expressed genes, for example) and treating the list of all drug targets as the gene set we’re looking for enrichment in.

term2gene <- dtex_targets %>% 
  mutate(term = name) %>% 
  select(term, gene) %>% 
  distinct()

tidy <- viper_res %>% 
  group_by(latent_var) %>% 
  nest() %>% 
  mutate(data = lapply(data, function(x){
    geneList <- x$corVal 
    names(geneList) <- as.character(x$gene) 
    geneList <- sort(geneList, decreasing = TRUE)
    geneList
  })) ##nest gene lists, looking at top 5% of genes in each LV 

res <- parallel::mclapply(tidy$data, function(x){
  GSEA(geneList = x, TERM2GENE = term2gene)
}, mc.cores = parallel::detectCores())

names(res) <- tidy$latent_var

##determine if no results for a given LV
empty_idx <- lapply(res, function(x){
  if(nrow(x@result)==0){
    TRUE
  }else{
    FALSE
  }
})

##remove if no results
res_complt <- res[empty_idx==F]

time to do ALL OF THE PLOTS!

First, dotplot:

targ_counts <- dtex_targets %>% 
  group_by(name) %>% 
  summarize(count = n())

plots <- lapply(names(res_complt), function(x){ 
  foo <- res_complt[[x]]
  
  show <- c(foo$Description[order(foo$enrichmentScore, decreasing=TRUE)][1:10],
            foo$Description[order(foo$enrichmentScore, decreasing=FALSE)][1:10])

  plt1 <- dotplot(foo, showCategory = show, x = "count", orderBy = "Count") + 
    theme_bw() + theme(legend.position="left") +
    scale_y_discrete(label = function(y) stringr::str_trunc(y, 30))

 
  targets <- targ_counts %>% filter(name %in% show)
             
  bar <- fortify(foo, showCategory = show, split = NULL) %>% 
    full_join(targets, by = c("ID"="name")) %>% 
    mutate(x=eval(parse(text="Count")))

  idx <- order(bar[['Count']], decreasing = TRUE)
  bar$Description <- factor(bar$Description, rev(unique(bar$Description[idx])))

  plt2 <- ggplot(data = bar) +
     geom_bar(aes(x=Description, y = count), stat = "identity") + 
     coord_flip() + 
    theme_bw() +
    theme(axis.title.y=element_blank(),
        axis.text.y =element_blank(),
        axis.ticks.y=element_blank()) +
    labs(y = "Total number of targets")
    
  
  legend <- cowplot::get_legend(
  # create some space to the left of the legend
  plt1 + theme(legend.box.margin = margin(0, 0, 0, 12))
)
  
  plots <- cowplot::plot_grid(plt1 + theme(legend.position = 'none'),
                     plt2,
                     legend,
                     ncol = 3,
                     rel_widths = c(3,1,1), align='h', axis = 'left')
  
  title <- ggdraw() + 
  draw_label(
    x,
    fontface = 'bold',
    x = 0,
    hjust = 0
  ) +
  theme(
    plot.margin = margin(0, 0, 0, 7)
  )

  plot_grid(
  title, plots,
  ncol = 1,
  # rel_heights values control vertical title margins
  rel_heights = c(0.1, 1)

  )
})

plots
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

## 
## [[12]]

## 
## [[13]]

## 
## [[14]]

## 
## [[15]]

## 
## [[16]]

Then, network plots:

plots <- lapply(names(res_complt), function(x){ 
  foo <-res_complt[[x]]
  show <- c(foo$Description[order(foo$NES, decreasing=TRUE)][1:10],
            foo$Description[order(foo$NES, decreasing=FALSE)][1:10])
  plt2 <- cnetplot(res_complt[[x]], categorySize="pvalue", showCategory = show,
                    foldChange=tidy$data[tidy$latent_var==x] %>% unlist) +
     ggplot2::ggtitle(x)
})

plots
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

## 
## [[12]]

## 
## [[13]]

## 
## [[14]]

## 
## [[15]]

## 
## [[16]]

Then a heatmap of the top 20 by p value:

plots <- lapply(names(res_complt), function(x){ 
    foo <-res_complt[[x]]
  show <- c(foo$Description[order(foo$NES, decreasing=TRUE)][1:10],
            foo$Description[order(foo$NES, decreasing=FALSE)][1:10])
  plt3 <- heatplot(res_complt[[x]], showCategory = show) + ggplot2::ggtitle(x)
})

plots
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

## 
## [[12]]

## 
## [[13]]

## 
## [[14]]

## 
## [[15]]

## 
## [[16]]

Then a ridgeplot of the top 30 - this function doesn’t let you filter by name, so it’s just a selection of 30 most significant in both directions - this is good for demonstrating directionality. Some of these are VIPER proteins that are negatively, not positively, correlated with the LVs:

plots <- lapply(names(res_complt), function(x){ 
    foo <-res_complt[[x]]
  show <- c(foo$Description[order(foo$NES, decreasing=TRUE)][1:10],
            foo$Description[order(foo$NES, decreasing=FALSE)][1:10])
  plt4 <- ridgeplot(res_complt[[x]], showCategory = 30) + ggplot2::ggtitle(x)
})

plots
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

## 
## [[12]]

## 
## [[13]]

## 
## [[14]]

## 
## [[15]]

## 
## [[16]]

table of all significant results

tab <- lapply(names(res_complt), function(x){ 
  foo <- res_complt[[x]]@result %>% 
    mutate(latent_var = x) %>% 
    select(latent_var, everything())
}) %>% bind_rows()

# lv_drug_tab <- synBuildTable("Latent Variable Drug Set Enrichment Analysis", "syn21046734", tab)
# synStore(lv_drug_tab)

DT::datatable(tab)
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] cowplot_1.0.0          ggplot2_3.2.1          enrichplot_1.6.0      
## [4] clusterProfiler_3.14.0 synapser_0.6.61        purrr_0.3.3           
## [7] tidyr_1.0.0            dplyr_0.8.3           
## 
## loaded via a namespace (and not attached):
##   [1] bit64_0.9-7           RColorBrewer_1.1-2    progress_1.2.2       
##   [4] httr_1.4.1            tools_3.6.1           backports_1.1.5      
##   [7] DT_0.10               R6_2.4.1              DBI_1.0.0            
##  [10] lazyeval_0.2.2        BiocGenerics_0.32.0   colorspace_1.4-1     
##  [13] withr_2.1.2           tidyselect_0.2.5      gridExtra_2.3        
##  [16] prettyunits_1.0.2     bit_1.1-14            compiler_3.6.1       
##  [19] Biobase_2.46.0        xml2_1.2.2            labeling_0.3         
##  [22] triebeard_0.3.0       scales_1.1.0          readr_1.3.1          
##  [25] ggridges_0.5.1        stringr_1.4.0         digest_0.6.23        
##  [28] rmarkdown_1.17        DOSE_3.12.0           pkgconfig_2.0.3      
##  [31] htmltools_0.4.0       fastmap_1.0.1         htmlwidgets_1.5.1    
##  [34] rlang_0.4.2           RSQLite_2.1.4         shiny_1.4.0          
##  [37] gridGraphics_0.4-1    farver_2.0.1          jsonlite_1.6         
##  [40] crosstalk_1.0.0       BiocParallel_1.20.0   GOSemSim_2.12.0      
##  [43] magrittr_1.5          feather_0.3.5         ggplotify_0.0.4      
##  [46] GO.db_3.10.0          Matrix_1.2-17         Rcpp_1.0.3           
##  [49] munsell_0.5.0         S4Vectors_0.24.1      viridis_0.5.1        
##  [52] lifecycle_0.1.0       stringi_1.4.3         yaml_2.2.0           
##  [55] ggraph_2.0.0          MASS_7.3-51.4         plyr_1.8.4           
##  [58] qvalue_2.18.0         grid_3.6.1            blob_1.2.0           
##  [61] promises_1.1.0        parallel_3.6.1        ggrepel_0.8.1        
##  [64] DO.db_2.9             crayon_1.3.4          lattice_0.20-38      
##  [67] graphlayouts_0.5.0    splines_3.6.1         hms_0.5.2            
##  [70] zeallot_0.1.0         knitr_1.26            pillar_1.4.2         
##  [73] fgsea_1.12.0          igraph_1.2.4.2        reshape2_1.4.3       
##  [76] codetools_0.2-16      PythonEmbedInR_0.3.37 stats4_3.6.1         
##  [79] fastmatch_1.1-0       glue_1.3.1            evaluate_0.14        
##  [82] BiocManager_1.30.10   data.table_1.12.6     httpuv_1.5.2         
##  [85] vctrs_0.2.0           tweenr_1.0.1          urltools_1.7.3       
##  [88] gtable_0.3.0          polyclip_1.10-0       assertthat_0.2.1     
##  [91] pack_0.1-1            xfun_0.11             ggforce_0.3.1        
##  [94] mime_0.7              europepmc_0.3         xtable_1.8-4         
##  [97] tidygraph_1.1.2       later_1.0.0           viridisLite_0.3.0    
## [100] tibble_2.1.3          rvcheck_0.1.7         AnnotationDbi_1.48.0 
## [103] memoise_1.1.0         IRanges_2.20.1